A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters
نویسنده
چکیده
One of the most difficult problems in the design of an anomaly based intrusion detection system (IDS) that uses clustering is that of labelling the obtained clusters, i.e. determining which of them correspond to ”good” behaviour on the network/host and which to ”bad” behaviour. In this paper, a new clusters’ labelling strategy, which makes use of a clustering quality index is proposed for application in such an IDS. The aim of the new labelling algorithm is to detect compact clusters containing very similar vectors and these are highly likely to be attack vectors. Two clustering quality indexes have been tested and compared: the Silhouette index and the Davies-Bouldin index. Experimental results comparing the effectiveness of a multiple classifier IDS with the two indexes implemented show that the system using the Silhouette index produces slightly more accurate results than the system that uses the Davies-Bouldin index. However, the computation of the Davies-Bouldin index is much less complex than the computation of the Silhouette index, which is a very important advantage regarding eventual real-time operation of an IDS that employs clustering.
منابع مشابه
A Clustering-Based Unsupervised Approach to Anomaly Intrusion Detection
In the present paper a 2-means clustering-based anomaly detection technique is proposed. The presented method parses the set of training data, consisting of normal and anomaly data, and separates the data into two clusters. Each cluster is represented by its centroid one of the normal observations, and the other for the anomalies. The paper also provides appropriate methods for clustering, trai...
متن کاملNonparametric Genetic Clustering: Comparison of Validity Indices
Variable string length genetic algorithm (GA) is used for developing a novel nonparametric clustering technique when the number of clusters is not fixed a priori. Chromosomes in the same population may now have different lengths since they encode different number of clusters. The crossover operator is redefined to tackle the concept of variable string length. Cluster validity index is used as a...
متن کاملخوشهبندی خودکار دادههای مختلط با استفاده از الگوریتم ژنتیک
In the real world clustering problems, it is often encountered to perform cluster analysis on data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. In addition, traditional methods, for example, the K-means algorithm, usually ask the user to provide the number of clusters. In this...
متن کاملCustomer behavior mining based on RFM model to improve the customer relationship management
Companies’ managers are very enthusiastic to extract the hidden and valuable knowledge from their organization data. Data mining is a new and well-known technique, which can be implemented on customers data and discover the hidden knowledge and information from customers' behaviors. Organizations use data mining to improve their customer relationship management processes. In this paper R, F, an...
متن کاملPerformance Evaluation of Some Clustering Algorithms and Validity Indices
In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn’s index, Calinski-Harabasz index, and a recently developed index I . Based on a relation between the index I and the Dunn’s index, a lower bound of the value...
متن کامل